Radical-Enhanced Chinese Character Embedding

نویسندگان

  • Yaming Sun
  • Lei Lin
  • Nan Yang
  • Zhenzhou Ji
  • Xiaolong Wang
چکیده

We present a method to leverage radical for learning Chinese character embedding. Radical is a semantic and phonetic component of Chinese character. It plays an important role as characters with the same radical usually have similar semantic meaning and grammatical usage. However, existing Chinese processing algorithms typically regard word or character as the basic unit but ignore the crucial radical information. In this paper, we fill this gap by leveraging radical for learning continuous representation of Chinese character. We develop a dedicated neural architecture to effectively learn character embedding and apply it on Chinese character similarity judgement and Chinese word segmentation. Experiment results show that our radical-enhanced method outperforms existing embedding learning algorithms on both tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Radical Embedding: Delving Deeper to Chinese Radicals

Languages using Chinese characters are mostly processed at word level. Inspired by recent success of deep learning, we delve deeper to character and radical levels for Chinese language processing. We propose a new deep learning technique, called “radical embedding”, with justifications based on Chinese linguistics, and validate its feasibility and utility through a set of three experiments: two...

متن کامل

Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese

The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-direct...

متن کامل

Multi-Granularity Chinese Word Embedding

This paper considers the problem of learning Chinese word embeddings. In contrast to English, a Chinese word is usually composed of characters, and most of the characters themselves can be further divided into components such as radicals. While characters and radicals contain rich information and are capable of indicating semantic meanings of words, they have not been fully exploited by existin...

متن کامل

Component-Enhanced Chinese Character Embeddings

Distributed word representations are very useful for capturing semantic information and have been successfully applied in a variety of NLP tasks, especially on English. In this work, we innovatively develop two component-enhanced Chinese character embedding models and their bigram extensions. Distinguished from English word embeddings, our models explore the compositions of Chinese characters, ...

متن کامل

The temporal signatures of semantic and phonological activations for Chinese sublexical processing: an event-related potential study.

A large number of Chinese characters are made up by pairing a semantic radical and a phonetic radical. The phonetic radical usually gives a phonological clue for the pronunciation of the whole character but does not contribute to its meaning. Using an event-related potential (ERP) measurement, the present study was able to trace the very intricate interplay between phonological and semantic inf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014